A Web-based Kernel Function for Matching Short Text Snippets
نویسندگان
چکیده
Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if any, terms in common between two short text snippets. We address this problem by introducing a novel method for measuring the similarity between short text snippets (even those without any overlapping terms) by leveraging web search results to provide greater context for the short texts. In this paper, we define such a similarity kernel function and provide examples of its efficacy. We also show the use of this kernel function in a large-scale system for suggesting related queries to search engine users.
منابع مشابه
Corpus-based and Knowledge-based Measures of Text Semantic Similarity
This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focused mainly on either large documents (e.g. text classification, information retrieval) or individual words (e.g. synonymy tests). Given that a large fraction of the information available today, on the Web and elsewhere,...
متن کاملA procedure for Web Service Selection Using WS-Policy Semantic Matching
In general, Policy-based approaches play an important role in the management of web services, for instance, in the choice of semantic web service and quality of services (QoS) in particular. The present research work illustrates a procedure for the web service selection among functionality similar web services based on WS-Policy semantic matching. In this study, the procedure of WS-Policy publi...
متن کاملA Discrete Particle Swarm Optimizer for Clustering Short-text Corpora
Work on “short-text clustering” is relevant, particularly if we consider the current/future mode for people to use ‘small-language’, e.g. blogs, text-messaging, snippets, etc. Potential applications in different areas of natural language processing may include re-ranking of snippets in information retrieval, and automatic clustering of scientific texts available on the Web. Despite its relevanc...
متن کاملSemantically driven snippet selection for supporting focused web searches
Millions of people access the plentiful web content to locate information that is of interest to them. Searching is the primary web access method for many users. During search, the users visit a web search engine and use an interface to specify a query (typically comprising a few keywords) that best describes their information need. Upon query issuing, the engine’s retrieval modules identify a ...
متن کاملEffectiveness of web search results for genre and sentiment classification
The motivation of this study is to enhance general topical search with a sentiment-based one where the search results (called snippets) returned by the Web search engine are clustered by sentiment categories. Firstly we developed an automatic method to identify product review documents using the snippets (summary information that includes the URL, title, and summary text), which is considered a...
متن کامل